Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016

نویسندگان

  • Katsuhito Sudoh
  • Masaaki Nagata
چکیده

This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year’s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering forWAT 2015

This paper presents our Chinese-toJapanese patent machine translation system for WAT 2015 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. A head word and its modifier words are reordered by hand-written rules or a learning-to-rank model. Our system outperforms baseline phrase-based machine translations and competes with baseline tree-to-string machine transl...

متن کامل

System Description: Dependency-based Pre-ordering for Japanese-Chinese Machine Translation

This paper describes the Beijing Jiaotong University Japanese-Chinese machine translation system which participated in the 1st Workshop on Asian Translation (WAT 2014). We propose a preordering approach based on dependency parsing for Japanese-Chinese statistical machine translation (SMT). Our system achieves a BLEU of 24.12 and a RIBES of 79.48 on the Japanese-Chinese translation task in the o...

متن کامل

Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese

Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in differe...

متن کامل

The SAS Statistical Machine Translation System for WAT 2014

This paper is a description of the techniques and experiment results by SAS Institute Inc in WAT 2014 evaluation campaign. We participate in two subtasks of WAT 2014: the Chinese to Japanese track and the English to Japanese track. Our baseline system is MOSES statistical machine translation toolkit. We propose syntactic reordering approaches for English to Japanese and Chinese to Japanese tran...

متن کامل

Otedama: Fast Rule-Based Pre-Ordering for Machine Translation

We present Otedama,1 a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016